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Abstract 

Data mining is a technique for analyzing larger datasets to identify patterns, 
information, and relationships that may be used to solve challenging problems. 
Identifying outliers has attracted the focus of researchers working on a variety 
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K rds: : : : ; : 
. ae ds of areas. Outliers are things that behave differently from other objects. With 
utliers; : A ‘ 
Intiiionistie Pusey Peoen real-world data, rough set theory can cope with ambiguity and uncertainty. 
ity Relation; So far, the study has solely focused on spotting outliers using the membership 
Membership Relation; function. Outliers may be recognized using membership and non-membership 


Non-Membership Relation values, however, utilizing the principle of intuitionistic fuzzy proximity rela- 
tion. At this step, the indiscernibility of objects is discovered, and the quanti- 
tative data is then converted to qualitative data. This article proposes outlier 
detection in single universal sets using an intuitionistic fuzzy proximity relation 
with a rough set based on complement entropy and weighted density approach. 
The empirical study has been considered for ranking the colleges based on the 


parameters evaluated. 


1. Introduction 


Data mining techniques may be used to uncover 
hidden patterns in datasets. When not specifically 
focused, data available in the real world may include 
uncertainties that lead to ambiguity. If the data is 
unclear, accuracy may be compromised, and fail- 
ure may occur (Garcia, Luengo, and Herrera). The 
majority of data mining research has concentrated 
on three techniques: object classification, object 
grouping, and spotting outliers among objects. 
Clusters are formed when similar objects are gath- 
ered together. When the inner parts of the clus- 
ter are examined, however, some of the objects 
may vary depending on their characteristics, which 
are referred to as anomalies. Outliers are some- 
times known as anomalies (Cios, W Pedrycz, and 
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Swiniarsk). 

In a dataset, for example, people, birds, and ani- 
mals are classified into separate clusters. A man 
with six fingers is classified as an outlier in the 
human group based on a particular characteristic. 
Similarly, in each cluster, objects that deviate from 
other objects based on a certain property are referred 
to as outliers. Clustering’s major purpose is to find 
the subgroups that exist within a dataset. When 
objects inside a cluster are compared to objects 
among clusters, they show significant similarity. 

Outliers are items whose behavior deviates sig- 
nificantly from that of other objects (Hawkins). Out- 
lier detection is essential because the presence of 
outliers causes the system to operate slowly. As 
a result, it is essential to remove outliers from the 
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dataset. 


In 1965, Zadeh developed the notion of a 
fuzzy set to address and solve the issue of uncer- 
tainty (Zadeh). The membership part of fuzzy sets 
is simply focused on, but the information cannot 
be retrieved due to a lack of knowledge. In the 
actual world, non-membership value, which focuses 
on the deterministic aspect, correlates with member- 
ship value (S. K. Ghosh, Mitra, and A. K. Ghosh). 
Yet, the issue of membership and non-membership 
values must be addressed. As a result, Atanassov 
in the year 1986 presented membership and non- 
membership relationships through the intuitionistic 
fuzzy concept (Atanassov and Atanassov). 


To study object indiscernibility, the fuzzy prox- 
imity relation is utilized. As a result, it has been 
enhanced to include the concept of intuitionistic 
fuzzy proximity relation, which outperforms fuzzy 
approximation on rough sets (Bello and Falcon). 
The sum of the membership and non-membership 
values ranges between 0 and | (Nanda and Majum- 
dar). Also, the final relationship must be symmetric. 
The ordering relation may be applied to the dataset 
after identifying the equivalence classes. 

This article provides a method for converting 
quantitative data to qualitative data by combining 
intuitionistic fuzzy proximity with an ordering rela- 
tion. The rest of the article is structured as fol- 
lows: Section 2 goes through the literature review. 
The proposed methodology is explained in Section 
3. Section 4 demonstrates the notion of empirical 
study, and Section 5 concludes the chapter. 


2. Literature Review 


Many exceptional set and minimal exceptional set 
instances have been studied to identify outliers by 
computing the exceptional degree of each object 
in minimal sets (Jiang, Sui, and C. Cao). Outlier 
detection is widely used in statistics, however, well- 
known distribution values are limited to univariate 
data. Due to this constraint, it cannot be used in 
real-world data that contains multivariate data. 


Outliers are identified using the distance-based 
outlier identification approach, which measures the 
unusualness of their neighbors. Although it is a 
non-parametric technique, the computation time is 
lengthy (Chandola, Banerjee, and Kumar). The 
use of intuitionistic fuzzy sets for multiattribute 
decision-making is investigated. Various mathe- 
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matical programming models are built to provide 
the optimal weights for the attributes, as well as 
the associated decision-making methods were sug- 
gested. 

According to fuzzy set theory, an element’s 
membership in a fuzzy set is represented by a sin- 
gle value between zero and one. Even so, since 
there may be some hesitation degree, the degree 
of non-membership of an element in a fuzzy set 
may not always be equal to | minus the member- 
ship degree (Ejegwa et al.). As a result, intuition- 
istic fuzzy sets (IFS), a fuzzy set extension, which 
includes the amount of hesitation known as the hes- 
itation margin (which is defined as | minus the 
total of membership and non-membership degrees, 
respectively). 

Some researchers proposed a solution to 
decision-making problems by employing an intu- 
itionistic fuzzy soft set in two universal sets. Accu- 
racy and rough degrees were also used to acquire the 
optimal solutions. They also built a binary relation- 
ship with an intuitionistic fuzzy relation between the 
two non-empty universal sets U and V (Liu). 


3. Proposed Approach 


Consider a single universal set that contains quan- 
titative data. In the preprocessing stage, the 
quantitative data has been converted to qualita- 
tive data using intuitionistic fuzzy proximity rela- 
tion (Geetha, Acharjya, and Iyengar). The member- 
ship function (1) can be obtained using equation (1) 
and the non-membership (v) value can be obtained 
by using equation (2). 


4(0;,0;) = 1 — ((lO; — O5])/ 
(Maximum Value of the Parameter)) 


v(Oi, O3) = (Oi — Oy1)/ 


(2 * Maximum Value of the Parameter)) 


(1) 


Based on the indiscernible values obtained, the 
quantitative data is converted to qualitative data 
by using ordering relation. Now, in the post- 
processing stage, the indiscernible function, com- 
plement entropy values, and weighted density values 
of 

attributes and objects should be calculated (Zhao, 
Liang, and F. Cao). Based on the computed 
weighted density values, the threshold value will be 
fixed. The weighted density value of objects will 
be compared with the threshold value to identify 
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FIGURE 1. Working Model of the Proposed Methodology 


the outlier object. Table 1 shows the information 
about the dataset considered for evaluation. Figure 
1 shows the working model of the proposed method- 


ology. 
4. Empirical Study 


Let us consider the ranking of colleges based on the 
attributes of faculty, education system, placement, 
infrastructure, and collaboration with the indus- 
try. The colleges will be represented as C = 
(C1, Co, C3, C1, Cs} and the attributes are repre- 
sented as 


{Faculty,Education Sys- 
tem,Placement,Infrastructure,Industry Collabo- 
ration } 


The computed IFPR values for the attribute fac- 
ulty, education system, infrastructure, and industry 
collaboration are shown in Tables 2,3,4,5, and 6 
respectively. 

Let us consider the similarity value greater than 
0.83 and dissimilarity values lesser than 0.125, the 
equivalence classes obtained are shown below: 

R'= {{Cy, Co, Cs}, {Cz, Caf} 

R*= {{Cy, Co, Cz}, {Ca, Ca} } 

R® = {{Ci}, {Co, C3, Ca, C5} } 

R* = {{Ci, Co}, (C3, Ca, Cs} } 

R° = {{Ci, Co}, (Cs, Ca, Cs} 

Apply ordering relation and convert the quantita- 
tive data to qualitative data. The ordering relation is 
as follows: 


< Faculty: Good<Average 

~< Education system: High<Low 

< Placement: Good~< Average 

~< Infrastructure: High<Low 

~< Industry collaboration: Good~ Average 


4.1. Identify indiscernible relations among objects 
U/IND (Faculty) = {{C,, Co, Cs}, {C3, Ca} } 
U/IND (Education System) = {{C,, C2, C3}, {Ca, 
Cs}} 
U/IND (Placement) = {{Ci}, {Co, Cs, Cu, Cs }} 
U/IND (Infrastructure) = {{Ci, Co}, {C3, Ca, 
Cs}} 
U/IND (Industry Collaboration) = {{C,, C4, Cs}, 
{C2, C3}} 


4.2. Calculate complement entropy for the 
attributes 
CE(Faculty)= 2(1-2 ) + 2-2) = 2x242x3=2 
CE(Education System)= 52 ; 
CE(Placement)= = ; 
CE(Infrastructure)=5? 5 
CE(IndustryCollaboration= 32 


4.3. Calculate the attribute’s weight 


13 
Weight (Faculty) 30° 


13 
Weight (Education System) = aa 
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TABLE 1. Ranking of the Colleges 
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Colleges Faculty Education Placement Infrastructure Industry 
System Collaboration 

Cl 60 40 40 30 60 

C2 60 40 30 30 50 

C3 50 40 30 40 50 

C4 50 30 30 40 60 

C5 60 30 30 40 60 


TABLE 2. The IFPR for the attribute faculty 


TABLE 4. The IFPR for the attribute placement 


R1 60 60 50 50 60 R3 40 30 30 30 30 
60 1 1 0.833 0.833 1 40 1 0.750 0.750 0.750 0.750 

0 0 0.083 0.083 0 0 0.125 0.125 0.125 0.125 
60 1 1 0.633 0.833. 1 30 0.750 1 1 1 1 

0 0 0.083 0.083 0 0.125 0 0 0 0 
50 0.833 0.833 1 1 0.833 30 0.750 1 1 1 1 

0.083 0.083 0 0 0.083 0.125 0 0 0 0 
50 0.833 0.833 1 1 0.833 30 0.750 1 1 1 1 

0.083 0.083 0 0 0.083 0.125 0 0 0 0 
60 1 1 0.833 0.833 1 30 0.750 1 1 1 1 

0 0 0.083 0.083 0 0.125 0 0 0 0 


TABLE 3. The IFPR for the attribute education 


system 
R2 40 40 40 30 30 
40 1 1 1 0.750 0.750 
0 0 0 0.125 0.125 
40 1 1 1 0.750 0.750 
0 0 0 0.125 0.125 
40 1 1 1 0.750 0.750 
0 0 0 0.125 0.125 
30 0.750 0.750 0.750 1 1 
0.125 0.125 0.125 O 0 
30 0.750 0.750 0.750 1 1 
0.125 0.125 0.125 O 0 


13 
Weight (Placement) —; 


30 


17 


Weight (Infrastructure) = —; 


Weight (Collaboration) 


30’ 


13 


307 


TABLE 5. The IFPR for the attribute infrastruc- 


ture 

R4 30 30 40 40 40 

30 1 1 0.750 0.750 0.750 
0 0 0.125 0.125 0.125 

30 1 1 0.750 0.750 0.750 
0 0 0.125 0.125 0.125 

40 0.750 0.750 1 1 1 
0.125 0.125 0O 0 0 

40 0.750 0.750 1 1 1 
0.125 0.125 0O 0 0 

40 0.750 0.750 1 1 1 
0.125 0.125 O 0 0 

4.4. Calculate the Weight of Objects 

W (Ci) = (5X ao) + (GX 30) + (5 X Ba) 

(2x 2) + (2 x 8) = 109; 


W (Cy) = 1.26; W (C3) = 1.29; 


W (Cy) = 1.26; W (Cs) = 1.38; 
If the threshold value is 1.26, 
C, which is lesser than 1.26 is determined to be an 
outlier. 
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TABLE 6. The IFPR for the attribute industry 
collaboration 


RS 60 50 50 60 60 
60 1 0.833 0.833 1 1 

0 0.083 0.083 0 0 
50 0.833 1 1 0.833 0.833 

0.083 0 0 0.083 0.083 
50 0.833 1 1 0.833 0.833 

0.083 0 0 0.083 0.083 
60 1 0.833 0.833 1 1 

0 0.083 0.083 0 0 
60 1 0.833 0.833 1 1 

0 0.083 0.083 0 0 


TABLE 7. Qualitative Data 
College Faculty Educa Placeme: Infrast: Industry 


Sys- Col- 

tem labo- 

ration 

Cl Good High Average High Good 
C2 Good High Good High Average 
C3 AverageHigh Good Low _ Average 
C4 AverageLow Good Low _ Good 
C5 Good Low Good Low Good 


5. Conclusion 


This article suggests a method for finding outliers 
in a single universal dataset utilizing intuitionistic 
fuzzy proximity relations with weighted density val- 
ues of objects and attributes. The quantitative data 
is transformed into qualitative data by calculating 
membership and non-membership values, followed 
by ordering relations. Then by finding indiscerni- 
bility, computing complement entropy and weighted 
density values of objects and attributes, employ 
the threshold value. The threshold value is com- 
pared with the computed weighted density value of 
objects, to determine outliers. The empirical study 
shows that the proposed methodology detects out- 
liers accurately. The implementation of the pro- 
posed idea will be further investigated in future work 
with two universal sets. 
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